all kriging varieties we have previously studied, derive estimates using only one variable
universal kriging uses a predicting variable to model the mean (instead of assuming it to be constant and/or known)
but only the spatial (auto-) correlation of the target variable is used during the kriging estimation of the difference to the mean at unsampled locations \(e(s)\)\[Z(s) = X(s)\beta + e(s)\]
What if we could also make use of the (cross-)correllation between the target variable and one or more additional variables?
Cokriging Idea
Cokriging Idea
Cokriging Idea
Cokriging Idea
Cokriging Idea
instead of only using the target variable cokriging allows to use auxilary variables (co-variables) for the prediction, if they exhibit correlation with the target variable
\(\hat{z_1}(s_0)\) is not only based on observations of \(Z_1\), but also on observations of variables \(Z_2, ..., Z_k\)
the prediction error variance is reduced by exploiting the cross-correlation between the variables
this is especially useful when the target variable is undersampled (e.g. because measuring this variable is expensive)
the co-variables may be measured at the same points as the target (co-located), at other points, or both
Cokriging sets the multivariate equivalent of kriging, which is, in terms of number of dependent variables, univariate.
Cases where this is useful
Multiple spatially correlated variables such as
chemical properties (auto-analyzers!)
sediment composition
electromagnetic spectra (imagery/remote sensing)
ecological data (abiotic factors; species abundances)
Two types of applications:
undersampled case: secondary variables help prediction of a primary, because we have more samples of them (image?)
equally sampled case: secondary variables don’t help prediction much, but we are interested in multivariate prediction, i.e. prediction error covariances.
Cokriging prediction
The estimator based on a sample \(Z=(Z_1,Z_2,...,Z_k)\) is given by: \[\hat{z_0}(s_0) = \sum_{i=1}^n \sum_{j=1}^k \lambda_{ij} z_j(s_i)\]
The weights are chosen such that the variance of the estimation error is minimized and that
while \(K_{ij}(s_u,s_v)\) denotes the auto-/cross-covariance of variable \(Z_i\) at location \(s_u\) with variable \(Z_j\) at location \(s_v\):
\(K_{ij}(s_u,s_v):={\rm Cov}(Z_i(s_u),Z_j(s_v))\)
The covariance matrix needs to be positive definit, in order to achieve valid solutions.
Cokriging prediction
What is needed?
we need the spatial autocorrelation of the target variable (like in other kriging varieties) and of all auxilary variables
we need the cross-correllation between the target variable and all auxilary variables
How to get the cross-covariances between variables \(Z_1 ,Z_2 ,...,Z_k\)?
What is needed?
The main tool for estimating semivariances between different variables is the cross variogram, defined for collocated data as \[\gamma_{ij}(h) = \mbox{E}[(Z_i(s)-Z_i(s+h))(Z_j(s)-Z_j(s+h))]\] and for non-collocated data as \[\gamma_{ij}(h) = \mbox{E}[(Z_i(s)-m_i)(Z_j(s)-m_j)]\] with \(m_i\) and \(m_j\) the means of the respective variables.
What is needed?
Sample cross variograms are the obvious sums over the available pairs or cross pairs, as in one of \[\hat{\gamma}_{jk}(\tilde{h})=\frac{1}{N_h}\sum_{i=1}^{N_h}(Z_j(s_i)-Z_j(s_i+h))(Z_k(s_i)-Z_k(s_i+h))\]\[\hat{\gamma}_{jk}(\tilde{h})=\frac{1}{N_h}\sum_{i=1}^{N_h}(Z_j(s_i)-m_j)(Z_k(s_i+h)-m_k)\]
Cokriging - summary
takes additional information (auxilary variables) into account
additional computational cost for estimating additional (cross-) variograms
target variable and auxilary variables need to be correlated
cannot, in theory, be worse than ordinary kriging
if cross-correlation is 0, only the autocorrelation of the target variable is used
difference to universal kriging:
UK uses additional variables to estimate/model the mean
cokriging uses additional variables during the interpolation and thereby takes the (cross-)correlation into account
How to do this?
As multivariable analysis may involve numerous variables, we need to start organising the available information. For that reason, we collect all the observation data specifications in a gstat object, created by the function gstat. This function does nothing else than ordering (and actually, copying) information needed later in a single object. Consider the following definitions of four heavy metals:
Idea: Divide observations into training and validation sets.
Validation of interpolation
Often, a randomly sampled subset including ,e.g., 70% of the observations is taken as training set, whereas remaining observations are used for validation. We can use the training set to predict values at validation locations and compare predictions to the known values.
Validation of interpolation
Cross Validation
A better idea could be to partition the data into \(k\) equally sized subsets and consecutively use each of the subsets as validation set. As a result, we get predictions at all observation locations, not only for one smaller sized validation set. This method is called k-fold cross-validation.
Cross Validation
Special case 2-fold cross validation:
Randomly partition the data into two equally-sized subsets \(A\) and \(B\).
Use \(A\) as training set, i.e use observations of A to interpolate values at locations of \(B\).
Use \(B\) as training set, i.e use observations of B to interpolate values at locations of \(A\).
Compare predicted values to true observations at all observation locations
Cross Validation
Special case n-fold cross validation:
This is also called Leave-one-out cross-validation (LOOCV).
Select a single observation \(z_i\)
Interpolate the value at the \(i\)-th observation location ignoring the true value \(z_i\)
Repeat both steps for all observations \(i = 1,..., n\)
Compare predicted values to true observations at all observation locations
Exhaustive Leave-\(p\)-out cross validation: Use all possible subsets of size \(p\) as validation datasets, i.e. consider \(n \choose p\) different validation subsets. This becomes infeasible in practice: \[
n=155, p=10 ~(\textrm{meuse dataset}) \Rightarrow \textrm{more than} ~ 10^{15} ~ \textrm{options}
\]
Cross Validation
Instead of randomly dividing the data into folds, many researchers argue for a more target-oriented validation
Examples:
leave-location-out cross validation
leave-time-out cross validation
spatial cross validation
cross validation techniques based on distances in the feature space
Mean absolute error (MAE): \(\frac{1}{n} \sum_{i=1}^{n}\left| z_i - \hat{z}_i \right|\)
Correllation: \(\textrm{cor}(z,\hat{z})\)
…
Standard plots:
Predictions vs. observations
Residual histogram
…
Measures of fit
z-score:
To include the kriging variance, the \(\textrm{zscore}_i = \frac{z_i - \hat{z}_i}{\hat{\sigma}_i}\) can be computed for all observations. The z-score shows the difference between predicted and observed value in standard deviations.
Its mean and standard deviation should be close to 0 and 1 respectively if the variogram model fits well to the data. If its standard deviation is larger (smaller) than 1, the kriging standard error is underestimating (overestimating) the true interpolation error, on average.
Example: Simple vs. Ordinary Kriging
We want to compare SK and OK to predict logarithmic zinc concentrations in the meuse datase. We arbitrarily set the SK mean to 10 and use the same variogram models.
Computations can be slow (e.g. LOOCV for global Kriging with large \(n\))
Can be extended to multivariable settings (Co-kriging)
Cross-validation is also applicable to regression
Can be useful to analyze overfitting (e.g. for regression models)
CV cannot detect anything which is not in the data (e.g. say something about how well a selected variogram fits at smaller distances which are not in the data)
Refit variogram or not?
Spatial interpolation: Summary
Heuristic methods: IDW, Nearest neighbour
Geostatistical interpolation (Kriging)
Variograms (Experimental + theoretical model) are used to measure spatial dependencies between observation pairs
Simple Kriging: Constant known mean
Ordinary Kriging: Unknown constant mean
Universal Kriging: Varying unknown mean, include external variables in the trend
Block Kriging: Interpolation of averages inside particular areas / blocks
Co-Kriging: Include external variables in spatial depdencies